An algorithm for detection of extreme mass ratio inspirals in LISA data 
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The gravitational wave signal from a compact object spiralling toward a massive black hole (MBH) 
is thought to be one of the most difficult sources to detect in the LISA data stream. Due to the 
' ' ■ large parameter space of possible signals and many orbital cycles spent in the sensitivity band of 

SsJ , LISA, it has been estimated previously that of the order of 10^^ templates would be required for a 

. fully coherent search with a template grid, which is computationally impossible. Here we describe 

^■"^ ' an algorithm based on a constrained Metropolis-Hastings stochastic search which allows us to find 

and accurately estimate parameters of isolated EMRI signals buried in Gaussian instrumental noise. 
. We illustrate the effectiveness of the algorithm with results from searches of the Mock LISA Data 

Challenge round IB data sets. 

(N 

O^' I. INTRODUCTION 

^ ! 

Extreme mass ratio inspiral (EMRI) — a stellar mass compact object (CO) that is captured and spirals into a 
massive Black Hole (MBH) through the emission of gravitational radiation — is one of the most interesting sources 
^-H for the future LISA mission The inspiral proceeds very slowly for these sources (the inspiral rate is proportional 
, to the mass ratio which is typically 1 : 10^) so the CO spends a significant amount of time in the strong field close to 
■ the MBH. The GW signal contains information about the geometry of the central hole which, in the case of a strong 
signal, could be extracted to confirm or otherwise that the massive objects observed in galactic nuclei are indeed Kerr 
BHs, as we suppose 0, H, 0, H, [l] . EMRI observations may also be used to probe the stellar population in the central 
parsecs of galaxies, and measure the properties of astrophysical black holes to high precision Q. This is possible 
because the signal is long lived and so a coherent phase integration should recover the parameters of the binary to 
high accuracy, much better than anything that will be available from electromagnetic observations. We refer the 
[ reader to the review article for more details on the astrophysics that will be possible with EMRIs. 

In order to scope out issues associated with LISA data analysis for EMRIs, we require waveform models that are 
cheap and easy to generate but still capture the main features of true EMRI waveforms. One such model of the signal 
is the so-called analytic kludge waveform Q. It is a phenomenological template, constructed by piecing together 
$H ' the most important physical elements: post-Newtonian expressions for the rate of change of the orbital parameters 
and frequencies (Peter-Mathews approach) , periastron precession, and precession of the orbital plane around the spin 
axis of the MBH. While these waveforms are not faithful representations, they are nonetheless representative of the 
true signal. This means that they should be sufficient to answer questions about what accuracy we can achieve in 
estimating the source parameters and at what level the confusion noise from cosmological EMRIs will be 0, [lo| . 
Because these waveforms are simple and fast to generate they were chosen for use in the Mock LISA Data Challenge 
(MLDC). The MLDC was organized to stimulate the development of data analysis tools for LISA and to establish 
standard notations and conventions which allow comparison of different algorithms. There have been three challenges 
to date [m, [13, IHj [3 EBl J which were aimed at different sources: SMBH binaries, Galactic white-dwarf binaries and 
EMRIs. For the two EMRI challenges five data sets were released, each containing a single EMRI signal buried in 
instrumental noise. 

The five EMRI data sets had some parameters drawn from priors common to all data sets, which were: the mass 
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of the CO ^ G C/[9.5, 1O.5]M0, the spin S/M"^ e C/[0.5,0.7], the plunge time U[\,2] years and the eccentricity at 
plunge epi S ?7[0.15, 0.25]; and some parameters drawn from priors that were different for each data set: MBH 
mass M e C/[0.95, 1.05] x IO'^Mq (high mass binary) with SNRe C/[40, 110] (1.3.1 data set), M e J7[4.75, 5.25] x 
lO'^M© (medium mass binary) with SNRg C/[70, 110] (1.3.2 data set) and with SNRe C/[40,60] (1.3.3 data set), 
M G C/[0.95,1.05] X IG^M© (low mass binary) with SNR e t/[70, 110] (1.3.4 data set) and with SNR e C/[40,60] 
(1.3.5 data set). Note that the last two types of signal are considered to be the most likely EMRIs to be seen by 
LISA: a ~ lOM© BH falling into ~ IQ^Mq MBH in the galactic center [Tj. More details on the parameter sets can be 
found in [l^. MLDC round 1^ and round IB had the same sets of priors for the five data sets. For both challenges, it 
has been shown that the signal can be easily detected with high confidence, but with parameters quite different from 
the true ones. 

An important feature of EMRI signals is that they have many local maxima in the likelihood surface, which are 
quite well separated and can be as high as 75% of the true maximum. Search algorithms have a tendency to find 
secondary inaxima quickly and then get stuck there. This represents a true detection but with incorrect parameters 
(see [H, llBl for results). However in this article we will only regard a "detection" as finding the global maxima in 
the likelihood, i.e., the true source parameters. Secondary maxima are the biggest problem for LISA data analysis. 
Some signals can be seen by eye in a spectrogram or in the power spectral density of the data — the main problem is 
to estimate the source parameters with the best possible accuracy. A grid based search (with a sufficiently fine grid) 
would be guaranteed to find the global maxima as it covers the whole parameter space, but the required number of 
templates is so high [l^j that no one is presently considering doing it even with tight priors like in the MLDC. An 
alternative approach which has proven to be both efficient and accurate was first suggested in the context of LISA for 
non-spinning SMBH binary searches [H, [l^, [13] . This approach is the semi-stochastic Metropolis-Hastings Monte- 
Carlo (MHMC) method, where one constructs a search chain through the parameter space (these are not in general 
Markovian) and follow this up by a Markov chain Monte-Carlo (MCMC) to sample the posterior distribution function. 
We say this approach is 'semi-stochastic' since, although successive points in the chains are chosen at random, they 
are chosen from directed proposal distributions. This method involves generating templates as the chain moves, but 
its power lies in the fact that the number of points usually required to find the source is many fewer than in a full 
template grid. However, the chains can get stuck on local maxima. One needs to use the properties of the signal to 
make chains move off local maxima and explore the parameter space more widely. 

By including tricks such as simulated annealing — "heating" the likelihood surface by scaling both the log-likelihood 
and the size of proposed jumps by a temperature factor — and frequency annealing — systematically increasing the 
range of frequencies included in the waveform template — MHMC has solved the search problem for non-spinning 
SMBH binaries [H, [H, [10] . The full utility of MHMC for EMRI searches has so far not been demonstrated. In the 
round IB MLDC release, one of the five EMRIs (of 1.3.1 type) was found by an MHMC technique [2l| that used 
simulated annealing and began with searches on sub-segments of the data that were combined before finally running 
a long chain on the whole data set. The authors [12] also attempted to search for EMRIs in the round IB data using 
MHMC, and recovered close to the true parameters for one of the five signals (of 1.3.2 type) before the Challenge 
deadline. However, in both round 1 and round IB, the best performing algorithm was not MHMC based, but a 
time- frequency analysis (23l . [2^ . Such searches are easier to implement and parameter estimation can be done if the 
EMRI is isolated and of sufficient brightness. However, the achievable accuracy of parameter estimation using such 
template- free techniques is not as good as template based methods, and the algorithm will suffer in the presence of 
multiple source confusion. 

In this paper we describe for the first time a complete template-based EMRI search that is able to detect and 
recover accurate parameters for bright, isolated EMRI sources buried in instrumental noise, with parameters drawn 
from any of the five canonical MLDC EMRI source types. This search technique is based on our previous MHMC 
search, but with several improvements which we describe below. 

The origin of all of the secondary maxima in the EMRI likelihood surface is in the characteristics of the signal. An 
EMRI signal is composed of many harmonics of the three fundamental orbital frequencies (of the radial r-motion, 
the azimuthal ^-motion and the polar ^-motion), which are evolving in time. These harmonics vary in strength 
(amplitude) and the local maxima arise from matching the phase of the strongest (or of several strong) harmonics 
for some period of time. It is possible for a signal with very different parameters to match the dominant harmonic 
very well for the whole duration of the signal but miss completely all the other harmonics. We have tried to exploit 
this property by using several chains to identify the dominant harmonic and then impose a constraint between the 
fundamental frequencies that fixes the frequency of the dominant harmonic at some reference time. The key idea of 
our search is to determine the frequency of the dominant harmonic/harmonics using several local maxima and this 
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was used for the IB submission [23]. However, since the round IB MLDC deadline, we have improved our search 
technique in three important ways. We have changed the parametrization of the signal, so that it is specified by the 
three orbital frequencies at some reference time, tref, and we have changed the proposal distribution accordingly. We 
use two main proposal distributions: a normal multivariate in the eigendirections of the Fisher Matrix and a variation 
of the Metropolis random walk which we will describe later. The second important improvement was to release the 
constraint after a certain point and let the chains correct the frequency of the dominant harmonic at tref ■ Finally, we 
have also improved the efficiency of generation of the templates by a factor of 3-5 which has allowed us to implement 
an analytic maximization of the likelihood over the initial phases, which reduces the parameter space that must be 
searched. This is possible because we have developed a new type of template composed of N independent harmonics 
with frequency evolution defined from the analytic kludge model. From this model we can construct an iV-dimensional 
F-statistic [25|. This will be described later. These improvements have led to the success of the algorithm. We have 
analysed the "blind" data sets (the data sets which MLDC participants were supposed to analyze and return results 
for) from MLDC round IB to tune the algorithm, and then analysed two other data sets using the search pipeline in 
a blind analysis. We successfully found the signal and determined the true parameters of the source for each of the 
seven data sets. The results of our search are summarized in Tables H] and to follow. In the following sections we 
give details of the search algorithm. 

The paper is organized as follows. In section |TT] we describe the signal model we have constructed for our search 
templates. The details of our search method are given in section HTTl We discuss the results of our search in section lTVl 
before concluding with a summary in section |Vl 



II. WAVEFORM MODEL 



The analytic kludge model of EMRI signals, as used to generate the data sets for the MLDC is described in 
and the particular implementation used for the MLDC can be found in [l^. For our search, we have simplified the 
model in order to reduce computational time. The signal can be described by harmonics of three fundamental orbital 
frequencies: i', = "f /(2Tr), fa = d/(27r), where a dot denotes a derivative with respect to time. The frequencies 
evolve according to the PN expressions 
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The harmonic structure of the signal is best seen when using a static source frame defined by the spin of the MBH 
which is assumed to be constant in this model. The radiative frame is then constructed using the direction of 
propagation (or direction to the source from the solar system barycenter (SSB)) and the spin direction of the MBH. 
The advantage of those two frames is that they are static and all the time dependence is encoded in the amplitude and 
phases of the harmonics explicitly. In the original analytic kludge paper Q , the waveform was expressed relative to a 
precessing frame, tied to the orbital angular momentum, which makes it more complicated to compute the harmonic 
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decomposition. In the static SSB frame, the signal takes the following form 

l.n,ra 

The amplitude of each harmonic depends on the source location (ecliptic coordinates), orientation of the spin and 
the orbital eccentricity. These expressions are known analytically, but are messy so we do not include them explicitly 
here. We have examined the amplitudes of the harmonics for a wide range of parameters and find that harmonics of 
the perihelion precession with I ^ 2 are significantly suppressed. We can also neglect the contribution from harmonics 
of the orbital frequency with n > 5 for orbital eccentricities less than e ^ 0.65. Moreover, by construction, the 
analytic kludge waveforms are quadrupolar and therefore only harmonics of the orbital plane precession frequency 
with m S [—2, 2] are allowed. This will not be the case for real EMRI signals and more sophisticated models include 
higher multipoles [l^, Knowledge of the analytic form of the harmonic amplitudes and the restriction of the 
number of harmonics, to as few as ~ 4-8 dominant harmonics in most cases, allows us to simplify the template and 
make its generation more efficient. The amplitudes of the harmonics depend on Bessel functions, with argument, 
ne(i), that is usually small, so a further simplification follows by expanding these as Taylor series and truncating at 
the desired level of accuracy. 

The technique of Time-Delay Interferometry (TDI) [2^ will be used in order to cancel the laser noise in the 
LISA data. The basics for this technique is to combine the data sent and received by different spacecraft with time 
delays chosen to cancel the common laser noise component. The LISA response function is consequently somewhat 
complicated, although it can be significantly simplified in the long wavelength limit ujgwIj ^ 1 [2§| {L is LISA's arm 
length ~ 16. 7s^ and ojgw is the GW frequency). For an EMRI into a lower mass MBH (1.3.4/1.3.5 type source), the 
frequency of the GWs can be quite high and neither the long wavelength nor rigid adiabatic approximations [29| are 
valid. Our code uses the full response but with time delays applied only to and not to the LISA motion — 

treating LISA as a solid rotating triangle. This is overkill for the high mass MBH EMRIs (1.3.1 type) (and probably 
for the medium mass, 1.3.2 type, sources as well), but we decided to use the same codes for all searches. We have saved 
on computing time for the higher mass systems by using a lower sampling rate during the integration of the orbital 
motion and then up-sampling while generating the TDI streams. We also use linear interpolation to compute the 
time delayed data from a regularly space SSB time series, rather than a more complicated and expensive interpolation 
scheme. 

We have verified our waveform templates against full analytic kludge templates generated using SyntheticLISA [30l | , 
by computing the overlap, which is the inner product, 

between two normalized (s|s) = {h\h) = 1 signals. In the expression above, a tilde denotes the Fourier transform, 
and Sh{f) is the one-sided noise power spectral density. We have found that the overlap between our approximate 
model and the accurately computed templates is in the range [0.93 — 0.99] depending on the source parameters, in 
particular the mass of the MBH (the overlap is usually higher for high mass MBH EMRIs). The loss in overlap comes 
primarily from mismatches in the amplitude, while the phase is tracked very well. This is to be expected, as we do 
not make any approximations in our computation of the evolution of the orbital parameters and frequencies. The 
small mismatch between the template and the signal will lead to a bias in the parameters estimated for the signal. 
A mismatch in amplitude will primarily affect the estimated signal-to-noise ratio/luminosity distance for the source, 
while a phase error will lead to errors in all of the intrinsic parameters. We have found that our model is very faithful, 
with typical model-induced parameter errors for MLDC source types being ~ 1-2ct, where a is the parameter error 
as estimated from the Fisher-Matrix. In other words, we expect the model-error to be of similar size, but no larger 
than the error in parameter recovery that arises from instrumental noise in the detector. This is confirmed by the 
results of our search summarised in Tables HHIIl We see that our parameter recovery was very good, except for the 
SNR which was as much as ^ 5% different in two of the low mass MBH cases. 

III. SEARCH ALGORITHM 

In this section we will describe the overall search algorithm. In practice, there were some differences between the 
searches for each source and we will discuss these source-specific details in the next section. Our search consists 
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of three steps. In the first step we look for the "footprints" of the signal, and for points from which we can seed 
our subsequent chains. In the second step we construct chains using the identified properties of the signal, via a 
constrained Metropolis Monte-Carlo search on the half year long segments of data. The final step is to narrow down 
the parameters of the signals b y e xtending the duration of the templates to the total length of observation (this is 
similar to what was done here j21|). In the following subsections we will give details on the implementation of the 
three steps. 



A. Uniform Jumps 

As mentioned previously, the basis of our search method is to identify as many strong local maxima in the likelihood 
as possible and then use the information encoded in the points to direct the search toward the true solution. The 
first step is very simple and remarkably efficient. In spirit it is similar to using a random template bank, as used in 
(Slj . We generate template waveforms for the last half a year of inspiral before plunge by integrating the equations 
of motion backwards), and with parameters randomly chosen from within uniform priors. For greater efhciency we 
include maximization of the log likelihood over the distance, plunge time and three orbital phases at plunge. We 
maximize over the plunge time in the usual way, by computing the correlation of the template with the data (instead 
of the inner product). The maximized value of the plunge time is then used for constructing a new filter and we 
compute the likelihood maximized over phases and distance. The maximization over the distance is done in the usual 
way: the log-likelihood (up to a constant factor) is given by 

A = - Y^i^i - hi\xi - hi) - 2(a;/|/i/) - ihi\hi), (7) 
I I 

where / — {A, E} runs over orthogonal TDI streams [28l | which play the role here of independent detectors, xj = nj+sj 
is the corresponding TDI data which is combined out of the noise nj and a signal s/; hj is a template and the inner 
product is defined in Eq. ([6]) above. 

An amplitude factor, A (inversely proportional to the luminosity distance to the source) can be factored out hj = Ah 
and maximized over analytically. The maximum likelihood estimator for the amplitude is 

^ = (8) 



and then the maximized log of likelihood is 



^max [A.) 



Eiihilhi) 
J2iixi\hi) 



2 

(9) 



This value is sometimes referred to as SNR^, since if sj = hj, it reduces to '^j{hi\hj), which is the square of the 
matched-filtering signal-to-noise ratio. Note that Eq. ^ is not sensitive to the sign of the inner product. 

Maximization over the three initial orbital phases is more involved. We consider a template which is constructed out 
of three bright harmonics only. The brightness of each m-harmonic for a given set of source parameters depends only 
on the inclination angle A and on the inclination of the MBH's spin to the direction to the source from the SSB [2J]. 
The prior range on plunge eccentricity ensures that we would usually have n = 2 and/or n = 3 as the dominant 
harmonics. This reasoning suggests we take the following harmonics hq = ni = 2, mg ^ rrii, ^2 = 3, TO2 = mo, 
with mo the brightest of mo, mi. The three initial phases for harmonics /i^'-* are 

% ^ n,(j)o + 2jo + niiao- (10) 

Each harmonic is of the form cos($q + 4>'^{t)) and hence may be decomposed as 

h^''> = Aj,cos$j,/i(^)(0) - Aj,sin$^/iW(7r/2), (11) 

here U'^ (0) means taken at zero initial phase. The three-harmonic template can therefore be written 

5 

h'^ = + + = ^ a,h^. (12) 
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Omitting all cross harmonic terms we can analytically maximize the likelihood of our template h'^ over all values of 
the constants aj, in a similar way to the i<"-statistic, 

_EiMfm _ Ei(.^i\hfH7r/2)) 

where i = 0,1,2. This leads to the following maximum likelihood estimators for the amplitude and phase of the 
harmonics: 



$i=arctan(^^j, A^o - V^f + ^ (^^^ 

With the choice of harmonics given above, we can obtain the initial orbital phases from the maximum likelihood 
estimates of the harmonic amplitudes and phases: 



1 

7o = 77 



^0 ^ h 



(15) 
(16) 



ao = . (17 

mo — TOi 

After a new plunge time is determined from the correlation analysis, we estimate the initial phases using the above 
method. This guarantees that wc have chosen the optimal phases if at least one of the harmonics in the template 
matches the signal. We then use the maximised phases to compute the log-likelihood, Eq. ([9]). For this stage of the 
search, the longer we run the more good points we get. We typically use about 100 - 200 CPUs for several days in 
this phase, and normally identify a few dozen distinct secondaries with SNR of about 20% - 40% of the maximum. 



B. Search on subsets of data 



The next stage is to split the data into half-year long segments and to run a Markov chain Monte Carlo (MCMC) 
using the Metropohs rejection/acceptance rule [111- The MCMC technique works as follows: given a data set s{t) and 
a set of templates h{t; x), we choose a starting point, x, in the parameter space. We then propose a jump to another 
point, y, in the space by drawing from a certain proposal distribution, q{y\x), and evaluate the Metropolis-Hastings 
ratio 

^ ^ T^iy)pis\y)qix\y) ^^^^ 
TT{x)p{s\x)q{y\x)' 

Here 7r(af) are the priors of the parameters, which, in our analysis, were taken to be uniform distributions within the 
ranges allowed by the MLDC. The function p{s\x) is the likelihood 

pis\x) = Ce-<''-''(^)l''-''(^)>/Q, (19) 

where C is a normalization constant and = 2 without annealing. This jump is then accepted with probability 
a = min(l, H), otherwise the chain stays at x. In our search, we use the Metropolis rejection/acceptance rule which 
simplifies the above by assuming the proposal q{y\x) is symmetric, so the ratio (jlSp is just the product of the likelihood 
ratio with the prior ratio. In this stage we also include simulated annealing, which means that O is allowed to vary 
from 2. This has the effect of smoothing and flattening the likelihood surface, which makes it easier for the chain to 
move around and climb up the surface to the maximum. The idea is to have a high heat initially, to encourage the 
chain to explore widely and find the global maximum, then cool the surface so the chain locks into the vicinity of the 
maximum. We vary the temperature as the chain advances according to a schedule of the form 

^ _ „ ^ / (SNRo/SNR)^ SNR < SNRo 
^ " ^ 1 1 SNR > SNRo ' 



where SNRq is typically 6 — 7. This annealing scheme is used at the beginning of the search and helps to find the 
trace of the signal quickly by exploring widely in the large parameter space. At later stages of the search we also made 
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use of thermostated annealing, as described in [19|, |20| , which encourages the search chains to explore the vicinity of 
identified maxima. 

At this stage of the search, we employ a different parametrization of the template, by prescribing the three orbital 
frequencies at some reference time t^e/ (usually chosen in the middle of the segment) instead of the mass and spin 
of the MBH. Then we start p chains, where p is the number of interesting points found in step one. The second step 
is based on the assumption that the points found in the first stage are not far in the parameter space from maxima 
(local/secondary or global/primary) on the likelihood surface. The MCMC is efficient at finding the maxima. We 
have been unsuccessful in attempts to make the chains efficiently jump between local maxima until they find the 
global one. Instead we will show how we can use the information stored in each maximum to guide the search in the 
right direction. 

We use two main proposals, q{y\x), in the MCMC: (i) jumps within the scaled ambiguity ellipsoid (defined by the 
eigenvectors and eigenvalues of the variance-covariance matrix); (ii) jumps which (almost) preserve the frequency of 
the dominant harmonic. Let us give some more details. Following [33l. |34| we introduce the metric on the parameter 
space: 

ds'^ = g^^^dx^^dx", g^^ = {h^f_,\h^^), (21) 

where h^p, = dh/dx^ and x^^ are parameters of the template. We can determine from the metric its eigenvectors, V,, 
and eigenvalues, Ai, and hence write g = V^LV,, where V is the matrix of eigenvectors and L is the diagonal matrix 
of eigenvalues. We introduce a new parametrization: 

ds^ = dW^dW, dWi = dY,^/\ dY = V'^dX 

where dX is a vector of parameters. We choose a value of ds^ according to a gaussian distribution, |A/'(0, 1)|, and 
choose the direction vector dW randomly oriented on the hyper-sphere with radius ds. In other words this proposes 
jumps on the surface of the ambiguity ellipsoid scaled according to the chosen ds. Another similar proposal which 
was used for the search of some data sets makes normal jumps in the eigendirections. This latter proposal was first 
suggested in Note that for both proposals, the jumps are further scaled by the temperature when using the 

simulated annealing scheme, to ensure larger jumps when the surface is "hot" . 

The second proposal which we have found to be efficient at the beginning of the chain is based on an estimation of 
the frequency of the dominant harmonic. For each of the high SNR points identified in the first step we can compute 
the frequency of all harmonics at the reference time, tref- In general, all of the points agree on the frequencies of 
the dominant harmonics, with a small dispersion, a. An example from our blind search is shown in Figure [1] One 
can clearly see that all the points agreed about the frequency of the harmonic 1 = 2, m = 2. The scatter reflects 
the relative amplitude of the harmonics: the weakest will have the largest dispersion. One notices that some points 
managed to match the m = 2 harmonic of the signal with m = 1, m = or even m = — 1 harmonic of the template 
(with completely wrong parameters). The idea of this proposal is to choose i^;=2,m=2 according to a distribution 
N{Fi^2,m=2, C2,2), whcrc Fi^2,m=2 and (72,2 are the mean value and the standard deviation estimated from the points 
found in the first stage. Values for and f~f are chosen from normal distributions in the same way. The value for v 
is then defined from the constrain: v = -Fi=2,m=2/2 — f-y — fa- This proposal works well in the beginning of the chains 
to refine the frequencies given other parameters. 

There are two reasons why we carry out the search on short duration segments at first: (i) it is much faster to 
generate small templates, so we can have longer chains, (ii) the accuracy of estimating the eigenvalues and eigenvectors 
of the Fisher Matrix in our main proposal drops with an increase in the duration of the template. 

Once the search reaches a static state, when the chains explores the posterior distribution around the local maximum, 
we stop the chains. The next step is to understand what we have detected. For this purpose we change the full template 
to a phenomenological template which consists of N independent harmonics similar to what we have used for the 
phase maximization (|12p . We assume that the inner product between harmonics is zero, which is not really true, 
however it is a reasonable approximation given the simplification that it allows. Each harmonic can be maximized 
over its amplitude and phase the same way as in (I13p , (I14p . For the physical template described in section |TT1 the 
amplitudes are functions of the spin orientation and the distance, so the amplitudes and phases are not independent. 
But, using this template we maximize over the amplitude and the phase for each harmonic independently (we call this 
a generalised i^-statistic) . This process tells us which harmonics of the template are actually detecting part of the 
signal. We claim a detection with a harmonic if the SNR > 5. Different chains detect different harmonics, although 
in the majority of cases they detect the dominant one. Usually it is easy to identify which harmonics have been 
detected by plotting a figure similar to Figure [U although frequently the indices of the harmonic in the template do 
not correspond to the indices of the harmonic of the signal that it has matched. In some cases, the chains do not 
agree on the identification of the harmonics and we cannot tell, for instance, whether the dominant harmonic of the 
signal is m = 2 or m = 1. In this case we run further analysis for both possibilities. Once the harmonic index of 
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FIG. 1: Harmonics I — 2, m = [—2, 2] for the best points identified in tfie first stage of the search. The frequencies are computed 
at tref in the middle of the third half year. The dominant harmonic is m = 2 (triangles down), the one with the smallest 
dispersion. 

the detected harmonics have been inferred, we apply a least squares fit to determine the three fundamental orbital 
frequencies. Harmonics with high SNR do not always lie closer to the true frequencies than lower SNR points. We see 
sometimes that lower SNR points matches the frequency of the harmonics very well, but fail to fit the other waveform 
parameters, so the derivatives of frequency do not match. 

For the next step we force the orbital frequencies to be fixed at the values estimated from all the chains. We do 
this by turning off the jumps in the orbital frequency until the SNR has reached a value which is better than the 
best attained by any of the chains prior to fixing the orbital frequencies. We then release the constraint. The aim of 
this procedure is to first find a good guess for the frequencies, then refine the other waveform parameters (hopefully 
close to the true ones), then refine our estimate of the frequencies again and so on. If the initial guess is not too bad, 
a high SNR is achieved quite quickly when re-adjusting the other parameters. We repeat the procedure "determine 
frequencies - fix them - release" several times if required, but this iteration need not usually be repeated more than 
three times. In the figure we show the final result of a run on the data from the low mass binary (1.3.4 type). In 
this example, the signal was found in the first and third half years of the data after only one iteration. 

Before we conclude this subsection we should reiterate that we need to detect at least three harmonics in order to 
be able to estimate the orbital frequencies. Moreover two of these must have different n-number and two must have 
different m and the same n. If the harmonics detected are not sufficient to determine the orbital frequencies, one 
can just use the frequency-refining proposal described above or use an iV — /c-harmonic template with the k already 
identified harmonics excluded. This forces the search to look for other harmonics. 



1. Parameter finalization 



A good indication that we have found the signal is that the highest SNR chains cluster in parameter space. In other 
words, if all of the chains with SNR close (say ^ 90%) to the maximum SNR found across all chains, have similar 
parameters. A further indication is that these "best" parameters are approximately the same for the different data 
sub-segments. A good example was shown in Figure [2] above, where we see that the first half-year and third half-year 
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FIG. 2: Non-blind search for the low-mass binary. Results are shown for the search of the first (top) and third (bottom) 
half-year long segments. We show SNR versus parameter value for all chains for two cases: the inclination angle, A, (left) and 
the MBH spin parameter (right). The vertical line in each plot is the true value of the parameter. 



searches are producing similar results. 

At this stage, the accuracy of the recovered parameters is relatively poor because we have been searching shorter 
data segments — the total SNR is therefore lower and there is greater degeneracy between waveform parameters over 
a short duration of signal (one can accurately fit a small number of cycles in many distinct ways). The final stage of 
the search involves first reparameterizing the templates from all the chains in the different segments of data by their 
frequencies etc. at a common reference time (usually ire/ = 0) and then increase the duration of the template first 
to one year and then to two. We then run an MCMC search with these longer waveforms, with chains starting at 
each of the high SNR points identified in the previous stage of the search. In Figure [3] we show how the parameter 
estimation improves as we increase the template duration from one half-year to two years. 

In this stage, we again use the generalised F-statistic with our A'^-harmonic template to begin with. This reduces 
the parameter space as it does the maximization over spin orientation analytically and so it is more efficient to use 
than the physical model. There are two caveats to using this template, however, (i) If the number of harmonics 
included is large it is very slow, as we need to maximize the likelihood for each harmonic. However, 5-8 harmonics 
are usually enough to build up an SNR that is comparable to the full 25 harmonic SNR. (ii) The maximization leads 
to a smoother but larger ambiguity (error) ellipsoid in parameter space. The smoothness helps the chains to reach 
the global maxima, but the fact that the maximum is quite fiat gives a larger error in parameter estimation. This 
is indicated in Figure [3] by the wide spread in parameter values in the search with half-year long templates. To 
finally improve the parameter estimation we need to finish the search by using the full physical template after we 
have found global maxima using the A-harmonic template. To seed this final analysis, we need an estimate of the 
spin orientation. This can be obtained from the estimated amplitudes of the harmonics, but the analytic expressions 
are so complicated that it would have to be done numerically. Instead, we compute the likelihood for various spin 
orientations and take the one giving the highest value. However, there is a four- fold degeneracy in the angle </)if , and 
a complete degeneracy in Ok- This is shown in Figure 2] for the blind search for the high mass MBH EMRI. This plot 
is colour-coded by SNR, and the points were chosen randomly from a uniform distribution over the 6k — (pK plane. 
One can see that it is hard to distinguish between four different values of 4>k and it is almost flat in Ok- To deal with 
this, we ran chains for several different choices of 9k, <t>K and in the end took the chain with highest SNR, which 
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FIG. 3: Improvement in parameter estimation in the blind search for the low mass MBH EMRI (1.3.4 type) as we increase the 
duration of the template from half a year (black points) to one year (red points) to two years (green points). 



did find the correct values. 



IV. RESULTS 



In this section we will describe some source-specific peculiarities encountered while we were analyzing the data sets. 
In each case, the signal was detected at a different stage of the search algorithm described in the previous section, 
and we will attempt to explain why this was the case. This section will be divided into two subsections. The first one 
is dedicated to the non-blind searches which were the basis for the algorithm development and tuning. The second 
subsection gives details of the "blind" searches we did to test the search pipeline. 



A. Round IB analysis 

While developing the algorithm, we analyzed the five "challenge" data sets that were released within the MLDC 
round IB. The bulk of the search tuning and development was done after the submission deadline, so we knew 
parameters of the signals. We used our knowledge of the parameters only to identify when the search was going in the 
wrong direction so that we could try other techniques and to identify when we had detected the signal. The results 
of these searches, for the intrinsic source parameters, are summarized in Table |T1 

The signal was detected at different stages of the search in the various cases. The easiest signals for this algorithm 
to find appear to be those from medium mass MBHs (data sets 1.3.2 and 1.3.3). We believe the reason for this is 
that the frequency evolution of the harmonics is not so great that the harmonics are hard to detect (which is true 
for the low mass MBH case), but it is sufficient that the parameter space is not too degenerate (unlike the high mass 
MBH case which is very degenerate). The second signal in TableU 1.3.2, was detected without any iterations in the 
"determine frequencies - fix them - release" phase of the search. The second medium mass binary, 1.3.3, required 
more work (one iteration), primarily because of the lower SNR of this source. 

The high mass and low mass MBH systems are more difficult to detect. The signal from the high mass MBH EMRI 
has a lot of secondary maxima which are quite strong compared to the primary. These arise because the evolution in 
these systems is very slow, so it is easy to match harmonics for long periods of time with very different parameters. 
These secondary maxima are well separated and lie all over the parameter space. The analyzed signal was even worse 



11 




60 



50 



40 



30 



20 



10 















0.5 



1.5 



2 



2.5 



3 



FIG. 4: Degeneracy in determination of the orientation of the MBH spin for the blind search of the high mass binary (1.3.1 
type). 

than usual, because the inclination of the spin of the black hole to our line of sight was such that almost all of the 
signal power was concentrated in a single m-harmonic for each n (we encountered a similar case in our blind search 
and will discuss this later). 

The low mass MBH EMRI is our canonical EMRI, as we expect these systems to dominate the event rate 0, [13] ■ 
For these systems, the harmonic frequencies evolve rather significantly over the inspiral and so a template needs to 
match both the frequency and frequency derivative of a harmonic rather accurately in order to get high SNR. This 
means there are fewer secondaries, but, at the same time, it also implies the global maximum is rather 'sharp' in 
parameter space, which is reflected in the better accuracy of recovered parameters. Usually these signals require more 
time on the first stage of the search (or a larger number of CPUs). The loudest signal (1.3.4 — fourth in the table) 
was more difficult to detect because of its orientation (A is close to 7r/2). This caused us to make an incorrect guess 
of the dominant harmonic when doing the phase maximization. We had to use an iV-harmonic template in the search 
with N = 9. The second signal (1.3.5) was not peculiar, but we had to do two iterations of the "determine - fix - 
release" part of the search since our first guess of the orbital frequencies was not very good due to the lower signal 



Since the high and low mass MBH EMRIs were the hardest to find, we decided to test the algorithm pipeline by 
performing blind tests on one data set containing a high mass MBH EMRI, and one data set containing a low mass 
MBH EMRI. We use the MLDC round IB "training" data sets 1.3.1 and 1.3.4 for this analysis, although we did not 
consult the parameter key until we had finished the search. The results are presented in Table HIl We used two criteria 
to determine the end-point of the search and claim a detection: (i) several chains converged to the same result; (ii) 
the SNR of all harmonics in these best chains was comparable to and no less than the best SNR of the corresponding 
harmonic found in all the other chains. 

For the high mass MBH signal we did not encounter any difficulties. The search with the iV-harmonic template 
resulted in quite a large error bar on the parameters because of the degeneracies in the parameter space and because 
of the relatively low SNR of this source. The degeneracy in 6k, <t>K for this source is illustrated in Figured) The 
signal was found after one iteration of the "determine - fix - release" stage. 

The signal from the low mass MBH EMRI proved much more interesting and difficult. Almost all the power was 



SNR. 



B. Blind tests 
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TABLE I: Results of the analysis of the five "blind" data sets used in MLDC Challenge IB. 3. These are 1B.3.1-1B.3.5 going 
from top to bottom. The analysis of these data sets was not blind, as it was mostly finished after the parameters were released. 
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"The columns are: radial orbital frequency, mass of the CO, mass of the MBH, eccentricity at t = 0, ecliptic co-latitude, ecliptic longitude, 
inclination angle A, spin of MBH, SNR recovered by template 



TABLE IL Results of the blind analysis of two data sets. For this analysis we used the MLDC IB. 3.1 and IB. 3. 4 "training" 
data sets. Our analysis was blind in the sense that the search was run end-to-end without reference to the parameters used to 
generate the data sets. Our results were then compared to the known parameters at the end. 
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concentrated in the m — 2 harmonic (with n — 2,3,4). The i^-statistic for 25 harmonics {I = 1, ...,5, m = —2, ...,2) 
is shown for each harmonic in the matrix below 
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(22) 



We were only able to detect three m = 2 harmonics (n = 2, 3, 4) with the chains, but a second m-harmonic is required 
in order to estimate the three frequencies. To achieve this, we used the method mentioned above: we constructed an 
A''-harmonic template that did not include the harmonics which had already been identified in the search. This allowed 
us to find the n = 2, to = 1 harmonic and hence make a preliminary estimation of all three orbital frequencies. We 
then needed three iterations of the "determine - fix - release" search to reach the final answer. At this stage, we were 
confident about the quality of the detection and, when we compared to the true parameters, we had indeed reached 
a very high accuracy for all parameters. 



V. DISCUSSION 



We have described an algorithm for the detection of EMRI signals in LISA data. This algorithm is based on 
running multiple Markov Chain Monte Carlo search chains simultaneously. However, it relies on the key refinement 
that the properties of the secondary solutions that all of the chains have identified are used together to constrain 
further movement of the chains. It is clear from the results in Tables Hl-ITTl that the algorithm is able to robustly and 
accurately find the true solution, in the simplified situation that we are searching for a single, bright EMRI source 
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buried in gaussian instrumental noise. We have successfully found the source in seven out of seven mock data streams, 
even when the parameters were such that the waveform had unusual features, e.g., orbital inclination close to A = 7r/2. 
This gives us reason to expect that the algorithm will work equally well in all comparable situations, independent of 
the parameters of the source. 

This is the first algorithm to be published in the literature that has been demonstrated to be able to detect and 
determine parameters for a "typical" EMRI si gnal — a lOM© black hole falling into a IO^Mq black hole — which 
we expect to dominate the LISA event rate (tI. llTl]. It is particularly gratifying that our parameter recovery is now 
reaching the theoretical level that was estimated from Fisher Matrix analyses [Q] - MBH mass and spin determinations 
at the level of lO"**, and sky position accuracies of 10^^. While we would expect the Fisher Matrix to accurately 
represent the shape of the global maximum for these high SNR sources, it is a purely local analysis and therefore does 
not account for the presence of secondary maxima. The fact that our algorithm can now find the global maxima from 
among the bright secondaries bodes well for using EMRI sources for high precision astrophysical observations 0, Q • 

The algorithm can still be improved further. The main problem at present is that the identification of secondaries 
in order to determine constraints on the orbital frequencies is done by hand. We stop the search chains after stage 
1, and then by hand examine the results in order to estimate suitable proposal distributions for the frequencies in 
the second stage. In principle this could all be done automatically, although it would require communication between 
different chains. If each chain had information about the best points found by all the other chains, then an adaptive 
proposal could be constructed from this information. The resulting search would be more like a population MCMC 
search. The other area of the search that could potentially be improved is the way in which we use annealing. The 
annealing scheme was borrowed almost verbatim from SMBH searches jl8| and we have not attempted to optimize 
it for the EMRI problem. While this is clearly not affecting the ultimate convergence of our search, the convergence 
speed might be improved by modifying the annealing scheme. Other MCMC variants, for instance parallel tempering, 
might also improve the efficiency of the search. Parallel tempering has been demonstrated in LISA searches to great 
effect [35|, but we have no immediate plans to include it in the search pipeline. 

The algorithm as described here has only been demonstrated for a significantly simplified scenario — detection of 
a single, bright EMRI buried in purely instrumental gaussian noise. The real LISA data stream will be very different, 
and is expected to contain many thousands of resolvable signals which will be overlapping in time and frequency, 
in addition to a noise foreground from galactic compact binaries and non-gaussian instrumental artefacts etc. It is 
not clear how well this search will perform under those circumstances, since it relies on being able to identify all 
the secondary peaks in the likelihood surface that are associated with the same signal. The relative SNRs and track 
shapes will provide powerful discriminators for this purpose, but there will inevitably be problems distinguishing a 
dim sideband harmonic of a bright source from the dominant harmonic of a similar but much more distant source. 
The best way to explore these complications is to attempt to analyse more realistic data sets. The next round of the 
MLDC, Challenge 3, includes an EMRI data set that contains five overlapping signals of low SNR. We will begin to 
explore source confusion by using that data set as a test case. As future MLDC releases become increasingly realistic, 
we will analyse them in order to demarcate where this algorithm fails and how it can be improved to cope with this 
greater realism. 
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